202 PART 5 Looking for Relationships with Correlation and Regression

You can study correlation and regression for many years and not master all of it.

In this chapter, we cover the kinds of correlation and regression most often

encountered in biological research and explain the differences between them. We

also explain some terminology used throughout Parts 5 and 6.

Correlation: Estimating How Strongly

Two Variables Are Associated

Correlation refers to the extent to which two variables are related. In the following

sections, we describe the Pearson correlation coefficient and discuss ways to analyze

correlation coefficients.

Lining up the Pearson correlation

coefficient

The Pearson correlation coefficient is represented by the symbol r and measures

the extent to which two variables (X and Y) tend to lie along a straight line when

graphed. If the variables have no relationship, r will be 0, and the points will be

scattered across the graph. If the relationship is perfect the points will lie exactly

along a straight line, and r will either be:»

»

1: If the variables have a direct or positive relationship, meaning when one

goes up, the other goes up, or»

» –1: If the variables have an inverse or negative relationship, meaning when one

goes up, the other goes down

Correlation coefficients can be positive (indicating upward-sloping data) or nega-

tive (indicating downward-sloping data). Figure 15-1 shows what several differ-

ent values of r look like.

Note: The Pearson correlation coefficient measures the extent to which the points

lie along a straight line. If your data follow a curved line, the r value may be low or

zero, as shown in Figure  15-2. All three graphs in Figure  15-2 have the same

amount of random scatter in the points, but they have quite different r values.

Pearson r is based on a straight-line relationship and is too small (or even zero) if

the relationship is nonlinear. So, you shouldn’t interpret r

0 as evidence of lack

of association or independence between two variables. It could indicate only the

lack of a straight-line relationship between the two variables.